Example of an anaylsis, start-to-finish (V2, quap)
Author
Jesse Brunner
Published
January 22, 2023
Sometimes it can be difficult to put all of the pieces together when they are dealt with separately, so here I want to demonstrate how we might approach analyzing a simple study in a Bayesian context. First, let’s explain the system and research question.
The system and question
My former M.S. student, Mitch Le Sage, was interested the role that scavenging invertebrates might play in the transmission and dynamics of Ranavirus epidemics. His general hypothesis was that by removing infectious carcasses, scavengers might reduce transmission rates among amphibian larvae in ponds.
One question that came out of this research was whether invertebrate scavengers, such as dytiscid beetle larvae, were capable of removing many amphibian carcasses. Eating one carcass in a lab setting is one thing, but what if there were a lot carcasses, as in a die-off? Would the scavengers be able to keep up?
Mitch and an undergraduate in the lab set up an experiment where dytiscid beetle larvae were house in small aquaria with 1, 2, 5, 8, or 10 carcasses of Ambystoma macrodactylum larvae to see how much was consumed in a 48 hour period. Since carcasses are scavenged, rather than consumed whole, he measured the starting and ending mass of the carcasses.
This sort of experiment falls reasonably well under the heading of “Functional Responses”, or perhaps you have heard of the “Holling Disc equations.” The history of these is pretty fun1, but for our purposes we can just consider two versions of how the foraging (or scavenging) rate increases with the availability of food (or carcasses).
First, we might simply assume that the foraging rate increases with the density of food. If there are more carcasses in the environment a scavenger will find one to consume more quickly. We could write:
\[
\mu_i = a \times c_i,
\] meaning that the average or expected amount of carcass mass consumed in a particular aquarium (\(\mu_i\)) increases linearly with the amount of carcass mass available (\(c_i\)) at rate \(a\). The parameter \(a\) is often called the “attack rate.” This is called, very creatively, the Type I functional response.
Secondly, we might assume that the rate of carcass removal saturates or levels off simply because a scavenger can only eat carcass so fast. At low carcass densities the scavenging rate is limited by the time it takes to find a carcass, but when carcasses are abundant the rate of removal is instead limited by handling time, \(h\). A classic way to frame this so-called Type II functional response2 is:
\[
\mu_i = \frac{a \times c_i}{1+a\times c_i\times h}
\] We wanted to see which of these models best described carcass removal and then to extrapolate our findings, including our parameter uncertainty, to the pond-level.
Simulating data from the Type I functional response
Let’s start not by looking at their data, but instead by making up some data. We’ll begin with just the linear model, \(\mu_i = ac_i,\) but can then repeat the pattern with with the second model.
First, our “predictor” data: The starting masses of ranged from a low of ~1/5th of a gram to ~10 grams, so we’ll simulate a data set over this range of carcass values. Since we used 18 aquaria in the study, we’ll use that many observations in our simulations, too.
n <-18# number of aquariacarc <-runif(n, min=0.2, max =10)carc
So these are some made up observations of initial carcass masses. There are two things that are quite unrealistic. First, we have a huge level of digits to the right of the decimal place, but that’s probably not something to worry about. Second, since we actually added 1, 2, 5, 8, or 10 carcasses in a tank it’s probably unrealistic to assume that all carcass masses are equally likely, even though the carcasses varied a bit in size. We could just run with this—it may not matter a lot—or we could more realistically simulate our carcass masses.
Indeed, looking back at the paper, we actually used different numbers of replicates for each number of carcasses, so let’s try to make this more realistic.
# actual numbers of carcasses used in each replicaten_carc <-c( rep(1, 5), rep(2, 5), rep(5, 3), rep(8, 3), rep(10, 2) )# now let's draw random carcass sizes and carc <-rnorm(n=18, mean =1*n_carc, sd =0.25) # mean of 1g/carccarc
That actually does a reasonable job of recapitulating the actual carcass masses!
Next, we need to simulate how much of the initial carcass mass is removed. Let’s pick an attack rate. You may not know much about what this should be, but you can at least make some basic assumptions. It cannot be greater than 1, as that would imply that more carcasses are eaten than available. It must be greater than zero. So we can at least say that \(1 \geq a \geq 0\). If we thought some more about the math and the system we could probably do better—for instance, only a fraction of the carcass is scavengable; the cartilage is probably not—but for a start, let’s just say \(a=1/2\).
a <-0.5# attack rate in g/48hmu <-0.5*carc
Let’s plot our new “data”.
plot(mu ~ carc)
And…it’s a line. I guess that shouldn’t be surprising, right? We just specified that it would be! But our data, that is our observations, should not follow exactly along this predicted or expected line, right? I mean, we’d expect a bit of noise both from measuring carcass remains3 and from difference in, among other things, the appetite of a dytiscid beetle larva, how good or bad it is at finding carcasses, and vagaries of where the carcasses end up relative to the beetle larva. Let’s therefore simulate observations from our expectation.
I would think that these slight variations are probably normally distributed. Unless we have a good reason to think otherwise, this is usually a reasonable starting point. We have our mean or expected value, \(\mu_i\), but we need to describe some variation from this expectation. As a start, let’s say \(\sigma = 1\).
obs <-rnorm(n, mean = mu, sd =1)plot(obs ~ carc, col ="red") # observationslines(mu ~ carc) # "expected" relationshipsabline(a=0, b=1, lty=2) # a 1:1 line just for context
That is a fair bit of variation! Moreover, some values are now going negative! That won’t work!
So what are our options? Well, we could turn down the noise (standard deviation or \(\sigma\)) to ensure that no observations are less than zero. We could also just set every observations that is less than zero to zero. However, both of these strategies seem like we are imposing arbitrary limits on the variability in observations. Plus, with ad hoc approaches like the second one, coding things gets hard (especially in the language of our Bayesian models)4 A better option would be to use a distribution of observation that ensured positive values.
One that comes readily to mind is the lognormal. Indeed, it is just a normal distribution that is exponentiated. Let’s see what that looks like:
# Normal distributionhist(rnorm(1e4), breaks =50)
# exponentiated normal distributionhist(exp(rnorm(1e4)), breaks =50, border ="blue", col =NA)# the lognormal distribution is the same as the exponentiated normalhist(rlnorm(1e4), breaks =50, border ="red", col =NA, add = T)
See?
Note, however, that converting between the standard deviation of the normal and lognormal is a bit weird. A standard deviation of 1 on the normal, linear scale, we would imply a standard deviation of \(\log(1) = 0\) on the log scale, but that would mean no deviation at all! Anyway, this is why we simulate, so we can get a sense of what these numbers imply
obs <-rlnorm(n, meanlog =log(mu), sdlog =1/2)plot(obs ~ carc, col ="red") # observationslines(mu ~ carc) # predicted lineabline(a=0, b=1, lty=2) # a 1:1 line just for context
So that’s a bit better, but perhaps too much noise. Let’s tone it down a bit.
obs <-rlnorm(n, meanlog =log(mu), sdlog =1/3)plot(obs ~ carc, col ="red") # observationslines(mu ~ carc) # predicted lineabline(a=0, b=1, lty=2) # a 1:1 line just for contextpoints(obs ~ carc, col ="blue") # new observations
Any points above the 1:1 line might be interpreted as measurement error on the initial or final carcass measurements. It’s probably not worth getting too worked about these issues5; we have data that are fairly realistic and avoid real issues like negative values.
Simulating across possible values of \(a\) (posterior prediction)
So far we have simply picked a single value of the parameters \(a\) and \(\sigma_{\log}\) (my fancy name for the sdlog parameter in the rlnorm() function call, above). But we do not necessarily know what these values are. Instead, we can simulate different data sets from different underlying expected relationships between the initial and consumed carcass mass based on different values of the parameters. Seem like a lot? It’s not too bad.
Let’s start by focusing on simulating a bunch of values of \(a\) and seeing what those imply about the expected relationship between initial and consumed carcass mass. We can keep our guess of 0.5 as maybe our mean, but allow these values to change a bit around this value.
as <-rnorm(n=20, mean=0.5, sd =0.25) # twenty values of a
Let’s see what these look like:
plot(x=carc, y=carc, type ="n") # just to set the scale of the graphabline(a=0, b=1, lty=2) # one-to-one line... slope is parameter bfor(i in1:20){abline(a=0, b=as[i], col ="gray")}
So we have twenty different possible values of the attack rate, \(a\), which is the slope of these lines. They cover a wide range of possibilities, from (in my simulation) almost zero to almost one, but most are in the middle. I guess without knowing any more than this, the results seem pretty OK to me. Now, however, we would like to simulate different data sets given these different possible values of the attack rate. Let’s start with 10 different data sets, though we could of course change this pretty easily.
We want to construct a data frame with ten data sets each with 18 observations that come from a different possible value of \(a\) and \(\sigma_{\log}\). Let’s build this up.
df <-data.frame(# our index of the data set numberd =rep(1:10, each =18),# the initial carcass mass; same as above, only we replicate# the n_carc vector 10 times, one for each data set.carc =rnorm(n=18*10, mean =1*rep(n_carc, times =10), sd =0.25) ) head(df)
hist(df$carc, breaks =30) # can see the peaks in carcass mass around 1-3, 5, 8, and 10 carcasses
This data frame so far just has our predictor or independent variable, initial carcass weights.
Now we need to match our values of \(a\) with our different replicate data sets, one value for each data set, and use that to
df$a <- as[df$d]
See what I did? as is a vector of possible values of the parameter \(a\). I want to use the first value in that vector for the first replicate data set, the second value for the second data set, and so on. Since the d column is the index of the replicate we can just use that to pull out the appropriate entry.
We can then calculate the expected value of the carcasses consumed, mu, for each value of carc using the replicate-specific value of a
df$mu <- df$carc*df$aplot(mu ~ carc, data = df) abline(a=0, b=1, lty=2) # one-to-one line... slope is parameter bfor(i in1:20){abline(a=0, b=as[i], col ="gray")}
So we can see that we now have 10 data sets with different initial carcass densities (\(x\)-values) and different slopes (values of \(a\)). However, we still do not yet have data sets of observations. For that, we need to simulate observations from the expected values given some level of noise. We were using a lognormal distribution, but above had just used a single value of sdlog =1/2 (\(\sigma_{\log}\)). We do not know that this is correct or right. Shouldn’t we explore the effect of different possible values of this parameter, too?
Since \(\sigma_{\log}\) must be greater than zero, but shouldn’t go to crazy high values, let’s assume it is exponentially distributed with a mean of 1/2. The exponential distribution has a single parameter, \(\lambda\), often called the rate parameter. The average of the distribution is simply \(1/\lambda\), so for a mean of 1/2 we should use rate = 2.
# possible values of sdloghist(rexp(n=1e4, rate =2), breaks =20)
We can use the same trick as above to assign a different value of \(\sigma_{\log}\) to each replicate data set.
If we plot these observations (in red) we can see what our parameters (\(a\) and \(\sigma_{\log}\)) say can happen:
plot(mu ~ carc, data = df) # expected values | aabline(a=0, b=1, lty=2) # one-to-one line... slope is parameter bfor(i in1:10){abline(a=0, b=as[i], col ="gray")}points(obs ~ carc, data = df, col ="red") # simulated observations
That is a lot, but it is clear that we sometimes get observations well above the one-to-one (dashed) line.
We can get a bit more a sense of this if we plot each data set separately.
# par(ask=TRUE) # <- asks you to hit return between each new graphfor(i in1:10){plot(obs~carc, data = df, type ="n") # for consistent axis limitspoints(obs ~ carc, data = df[df$d==i, ])abline(a=0, b=1, lty =2, )}
# par(ask=FALSE) # reset
We can see that the problems occur more often when we get larger values of sdlog used in simulating observations. We have a couple of choice. We could use a different distribution to describe the possible values of sdlog, which, again, is just saying how much variation we should see around the expected value, \(mu\). Options include the gamma or beta distributions, which don’t have such long tails. Our second choice would just be to say, you know, we’re making up stuff to see if our code works and to better understand what is reasonable; let’s file this away as sort of problematic, but probably not terribly important. I suggest we take option two for the moment so we don’t get too caught up in things that don’t matter a great deal. Besides, we can always come back to this if we want.
Simulating data from the Type II functional response
Again, our type II function response is of the form: \[
\mu_i = \frac{a \times c_i}{1+a \times c_i \times h},
\]
where \(h\) represents handling time, but everything else is the same. So this should be quick.
We can constrain handling time to be \(\geq 0\) from first principles, but I think we can also say that handling time is also unlikely to be super high, as in it is unlikely that handling time to consume a carcass is on the order of the time of the experiment (48h). We can therefore assume \(1 > h > 0\).
Let’s try simulating some values. I’m going to use the curve function, which is sort of like abline() except that it plots an arbitrary function instead of just a line.
# attack rate is 0.5, # handling time is 0.2 (20% of the duration of the experiment)# x is the initial carcass masscurve(0.5*x/(1+0.5*x*0.2), from =0, to =10)abline(a=0, b=1, lty=2)
It seems that this line curves away from the 1:1 line a bit, but not super fast. Play wit this a bit to get a sense of how handling time affects the shape of the curve. I think you’ll see that as \(h \rightarrow 0\) this line becomes more and more like the Type I functional response.
Now let’s allow for variation in \(h\). I’m going to use a beta distribution because it is bounded between zero and one and allows some control over the shape. I’m just tweaking the shape1 and shape2 parameters (sometimes called \(\alpha\) and \(\beta\) in more theoretical frameworks) until I get something that is skewed towards low values, but does not exclude higher values.
hs <-rbeta(n=10, shape1=1, shape2=3)plot(0:10, 0:10, type ="n") # set axis boundariesabline(a=0, b=1)for(i in1:10){curve(as[i]*x/(1+as[i]*x*hs[i]), add=TRUE, col ="gray")}
If we’re happy with these values, and I guess I am, we can then use them to generate expected values for any given initial carcass mass, like we did before with the Type I functional response. In fact, let’s just recycle what we have and add a column to the data frame for h, which we can then use to generate mu2 that we can use to generate obs2.
Note that we do not have to use the same values of carc, a, and sdlog, but doing so can be useful because we know what, precisely, is changed between mu and mu2, or obs and obs2, which can help us understand the consequences of different parameters or functional response.
As long as we kept track of the variable names, that was actually pretty simple, right?
plot(mu2 ~ carc, data = df) # expected values | aabline(a=0, b=1, lty=2) # one-to-one line... for(i in1:10){curve(as[i]*x/(1+as[i]*x*hs[i]), col ="gray", add =TRUE)}points(obs2 ~ carc, data = df, col ="red") # simulated observations
We still have a bit of trouble with some “observations” ending up with more carcass consumed than they started with—above the 1:1 line—but this is no worse than in our prior simulation of the type I functional response. Overall, it seems like this is working reasonably well!
From simulation to Bayesian model
So, as McElreath claims, and I have found, if you can simulate data you have essentially specified the components of a Bayesian model. Let me demonstrate, first with the Type I functional response model.
While we started with the parameters and then worked our way to observations in the simulation process, we specify the model in the reverse order, beginning with the description of the distribution of data. We simulated observations from a lognormal (using rlnorm()) given an expected value of \(\log(\mu)\) on the logarithmic scale, which I am going to call \(\mu_{\log}\) for consistency and transparency, and a standard deviation we called \(\sigma_{\log}\), so we write:
Now we developed our expectation, \(\mu_i\), from the Type I functional response, which was just the attack rate, \(a\), time the initial carcass mass, \(c_i\). Again, we can add this to our model as:
That is everything that describes our data. We simply need to specify our priors. We used distributions with particular parameters to simulate our data. We can simply use these same distributions and parameters as our priors!
\[
\begin{align}
Obs_i & \sim \text{LogNormal}(\mu_{\log_i}, \sigma_{\log}) \\
\mu_{\log_i} & = \log(\mu_i) \\
\mu_i & = a c_i \\
a & \sim \text{Normal}(0.5, 0.25) \\
\sigma_{\log} & \sim \text{Exponential}(2)
\end{align}
\] That is it. That is our Bayesian model, defined! Not so bad, right?
We can similarly define the second model for the Type II functional response:
Fitting the Bayesian model to the simulated data: quadratic approximation
I’m sure you are still expecting this to be hard once we fit the model to our data. I’m here to assure you that it does not need to be very hard. Indeed, if we can write down the model we are 90% of the way to using a quadratic approximation tool that McElreath created. Indeed, the code looks very much like the model defition. There are just a few small items to remember:
the model itself is wrapped up in an alist(). (Don’t fret about what it is, just be sure you use it.)
each line needs to be followed by a comma
we can use normal() or dnorm() without the x=... part, and the same for the other distributions. Somehow it feels better to use the R function names.
the data needs to be provided in a data frame or a list, after the alist() thing. the names of the columns or items in the list must match the names you use in the function.
library(rethinking)m1 <-quap(alist( obs ~dlnorm(log(mu), sdlog), # likelihood mu <- a*carc, # linear model# priors a ~dnorm(0.5, 0.25), sdlog ~dexp(2) ), data =list(obs = df$obs[df$d ==1], carc = df$carc[df$d ==1] ))
Seriously, that’s it! (It can be helpful sometimes to specify some reasonable starting values for parameters to avoid odd initial values, especially if you have diffuse priors. But here, we’re pretty OK.)
Working with the posterior
Right…so what do we do with this? Well, we can inspect the parameter distributions. Indeed, we probably want to see if we recaptured the True values of the parameters, the ones we simulated from. This is a good internal check to make sure everything is work.
Interal consistency checks
A first step is often just to inspect what you have:
precis(m1)
mean sd 5.5% 94.5%
a 0.4177047 0.0363987 0.3595325 0.4758769
sdlog 0.3728480 0.0603017 0.2764743 0.4692218
However, this doesn’t tell you a whole heck of a lot. It would be much easier to examine a sample from the posterior, as we have before. With quap()6 it is simple:
samples1 <-extract.samples(m1)str(samples1)
'data.frame': 10000 obs. of 2 variables:
$ a : num 0.424 0.385 0.422 0.395 0.474 ...
$ sdlog: num 0.434 0.45 0.404 0.407 0.366 ...
- attr(*, "source")= chr "quap posterior: 10000 samples from m1"
Now we can plot histograms or density plots of these parameter samples.
hist(samples1$a) # posterior of a
It can be helpful to see how the posterior differs from our prior expectation of the attack rate parameter, \(a\).
Notice how the data pulled our rather vague expectations about what the attack rate might be into a much more narrow, tall distribution about the values of \(a\) that are most consistent with the prior and data. Think of it as our golem learning from the data. It seems to have learned a lot!
But did it learn the right things? Recall that we actually know the True value of a used to simulate the data. Does our posterior capture this value? Let’s add in a vertical red line to show that.
Again, depending on the random values of a, the red line might not fall in the exact center of the posterior—because of the prior we assumed our golem was skeptical of values near zero or one (and beyond), even if those values were True—it should be included somewhere in there.
We can do the same for the other parameter, \(\sigma_{\log}\).
hist(samples1$sdlog, freq =FALSE, breaks =0:200/100)curve(dexp(x, rate=2), add =TRUE)abline(v=sdlogs[1], col ="red")
Again, we can see how our golem learned a lot about the variation in the data and the range of values the golem thinks are consistent with the priors and data (i.e., the posterior distribution) should include the True value, even if it isn’t the most probable value.
The influences of parameters are often not independent of one another, and it can be useful to see the probability of combination of parameters; although not so much in this case. Or, in other words, it can be useful to plot the priors and posteriors together so we can see how they co-vary.
Working in base R graphics is a pain, but McElreath has some handy functions that help, for instance, this contour plot.
Note that since we didn’t have the prior probabilities of any combination of parameters in any particular place, we had to calculate them. We can also see how much our posterior changed from the prior.
We can also calculate means and modes and intervals, just like you often get for parameter estimates.
We can also plot the relationships between initial and consumed carcass densities that are most consistent with the priors and data (i.e., most likely in the posterior). In this case, it’s pretty simple because our expected value, \(mu_i\) is just a simple straight line, so I’ll use the short-cut of the abline() function.
plot(x=0:10, y=0:10, type ="l", lty=2)# some draws from the prior of afor(i in1:20){abline(a=0, b= as[i], col =rgb(0,0,1, alpha =0.2))}# draws from the posterior of afor(i in1:100){abline(a=0, b=samples1$a[i], col =rgb(0,0,0, alpha =0.2))}
We can see that the relationship is much narrower than it might have been.
This is a bit, um, ad hoc. Why don’t we formalize a way to calculate the range of possible lines that are most consistent with our priors and data. That is, let’s find a consistency (or credible or confidence) envelope around the average line that includes, say, 90% of the posterior probability.
The first step, getting out the expected value for a range of predictor variable(s), is simple with the link() function.
# Generate a range of values along the x-axisnew_carcs <-seq(0.1, 10, length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of the parameter, a possible_lines <-link(fit=m1, data =data.frame(carc = new_carcs), n =10000)str(possible_lines)
num [1:10000, 1:50] 0.0378 0.0428 0.0433 0.0462 0.0373 ...
If you inspect this (e.g., head(possible_lines)) you will see that there are 50 columns representing the 50 positions along the x-axis, each with a y-value (expected carcass consumption) from the 10,000 draws from the \(a\) posteriors. Think of it as one line per row, each line (and row) corresponding to one sample from the posterior. That might be a bit too much to plot, but we can use a quantile function to find the 5th and 95th percentiles, or similar, among the observations at each point along the x-axis.
If you inspect this you see the range of the internal 90% of possible observations. However, it is wide rather than long. Thankfully, R makes it easy to transpose a matrix:
So now we have a data frame with all of the values we want to plot. The only issue is that it is hard to call variables with names like 5%, so we first need to rename those columns.
# rename 5% and 95% since R will choke on thosenames(pred_lines)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2) # 1:1 linelines(mean ~ carc, data = pred_lines, col ="red") # mean predictionlines(low ~ carc, data = pred_lines, col ="red", lty=2) #lowerlines(hi ~ carc, data = pred_lines, col ="red", lty=2) # upper# plot actual (simulated) observationspoints(x=carc, y=obs)
So in this figure the solid line is the predicted relationship between the initial and consumed carcass mass from the mean of the posterior distribution and the dashed lines represent the 5th and 95th percentiles of the posterior.
Posterior predictive checks
It can be useful to plot the range of likely values of the observations. After all, we might get the gist of the relationship right, but really miss the mark in the data-generating process. For instance, perhaps our actual (simulated) observations often lie outside of the range of what our golem expects. These more extreme observations would tend to pull our golem’s understanding to these extremes. More on this in the future. But for now, let’s see how we could produce and plot possible observations. It will follow a very similar path to our method of generating the consistency envelope.
Here again McElreath has created a helper function, sim(), that does a lot of the hard work for us. (However, looking back at our previous version of this section, and it wasn’t very hard!)
# Generate a range of values along the x-axisnew_carcs <-seq(0.1, 10, length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_obs <-sim(fit=m1, data =data.frame(carc = new_carcs), n =10000)str(possible_obs)
num [1:10000, 1:50] 0.0509 0.0491 0.0773 0.0526 0.0301 ...
As above, if you inspect this (e.g., head(possible_obs)) you will see that there are 50 columns representing the 50 positions along the x-axis, but now each row is a possible set of observations corresponding to a draw from the posterior distribution of \(a\) and \(\sigma_{\log}\). Again, we can use the quantile() or PI() functions to find a particular interval of these observations.
interval <-apply(X=possible_obs, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and turn into a data frame.pred_obs <-as.data.frame(t(interval))#add in the x-values,pred_obs$carc <- new_carcs#and the predicted line based on the mean value of $a$ from the posteriorpred_obs$pred <-mean(samples1$a)*pred_obs$carc# and plot this **posterior predictive interval** # rename 5% and 95% since R will choke on thosenames(pred_obs)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2) # 1:1 linelines(pred ~ carc, data = pred_obs, col ="red") # mean predictionlines(low ~ carc, data = pred_obs, col ="red", lty=3) #lowerlines(hi ~ carc, data = pred_obs, col ="red", lty=3) # upper# plot actual (simulated) observationspoints(x=carc, y=obs)
We should see that most, but not necessarily all of the observations fall within our 90% prediction interval. We should also not see systematic deviations from what the model says is likely, which would indicate some sort of bias or model misspecification. More on this later.
Hint
We can follow the same methods to simulate possible observations from the prior distribution, as well. It can be one thing to get a strong sense of the average or expected values based on prior values, but sometimes we completely flub the part of the model that specifies the probability of observing particular observations, making the noise much too small or much too large.
So at this point it seems that the model can reasonably recover the True values of the parameters (at least in my simulation) and seems to produce data that look like our (simulated) data. Does this work repeatedly for different data sets? It would be easy to check. Simply change the data set (e.g., df[df$d == 2, ]) and re-run the code. Ideally our model will work over a variety of possible data sets. If there are edge cases where things break, it can be useful to sort out why. Was it because of our priors? Were there limitations or implications of the way we produced expected values (i.e., \(\mu_i\)) that are coming to bite us? In essence, we want to have some confidence that our model is working how we think it should work, even in the edge cases.
Perspective
Is this a lot of work? Perhaps it is, but by this point we have a lot of confidence that we understand what we can expect and that our model is working as designed. It may feel like overkill in a simple case like this, but it is essential for more complex models where our intuition often fails us and simple inspection of parameter values yield minimal insights. So let’s practice in the simple case, getting a sense of the mechanics, before we move into less-charted waters. And anyway, the tools we (will) build to simplify things here should work in these more complex cases. If nothing else, I think working through these steps helps keep us in the driver’s seat of our analyses.
Repeating the process with the Type II functional response
We’ll repeat the process of producing, fitting, and checking our model with the Type II functional response. It will be quicker, but hopefully this repetition makes things feel more familiar. Again, the model is: \[
\begin{align}
Obs_i & \sim \text{LogNormal}(\mu_{\log_i}, \sigma_{\log}) \\
\mu_{\log_i} & = \log(\mu_i) \\
\mu_i & = \frac{a c_i}{1+a h c_i } \\
a & \sim \text{Normal}(0.5, 0.25) \\
h & \sim \text{Beta}(1,3) \\
\sigma_{\log} & \sim \text{Exponential}(2)
\end{align}
\]
We’ll again use the observations from the first simulated data set as our data, but this time use the observations simulated from the Type II model.
m2 <-quap(alist( obs ~dlnorm(log(mu), sdlog), mu <- a*carc/(1+a*carc*h), # changed# priors a ~dnorm(0.5, 0.25), h ~dbeta(1,3), # new sdlog ~dexp(2) ), data =list(obs = df$obs2[df$d ==1], carc = df$carc[df$d ==1] ))precis(m2)
mean sd 5.5% 94.5%
a 0.5759561 0.10967401 0.40067584 0.7512363
h 0.2272782 0.09250118 0.07944347 0.3751130
sdlog 0.4494439 0.07269035 0.33327067 0.5656171
Again, it is pretty darn simple. We changed our model of mu and added a new paramter. Otherwise we just had to provide the new observations and were were good to go!
I’m seeing some values of \(h\) that are below zero. We specified that \(h\) should be beta-distributed, which implies it is between zero and one, inclusive. What gives? This is because we are using a quadratic approximation, which assumes the posteriors of the parameters are normally distributed. Thus we can get tails that go outside of the bounds we assume. We won’t see this problem when we get to the full MCMC engine, but we may here and there with quap(). I wouldn’t stress about it since we know we’re just approximating the posterior. But be warned.
We can also see the relationship between \(a\) and \(h\) in the posterior7.
# posteriorplot(samples2$a, samples2$h, pch=16, col =rgb(0,0,1, 0.05))
Notice that there is a positive relationship such that larger estimates of attack rates are associated with larger estimates of handling time. Can you think of why?
Let’s now plot our understanding of the relationships between initial and consumed carcasses most consistent with the prior and data. Or, better, we can plot a consistency envelope around our predicted line. Here again we can use link().
# Generate a range of values along the x-axisnew_carcs <-seq(0.1, 10, length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of the parameter, a possible_lines2 <-link(fit=m2, data =data.frame(carc = new_carcs), n =10000)# at each position then find the upper and lower 90% intervalinterval_lines2 <-apply(X=possible_lines2, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to a data framepred_lines2 <-as.data.frame(t(interval_lines2))#add in the x-values,pred_lines2$carc <- new_carcs# and the predicted line based on the mean value of a & h from the posteriorpred_lines2$pred <-apply(X=possible_lines2, MARGIN =2,FUN = mean)# plot posterior predictive interval# rename 5% and 95% since R will choke on thosenames(pred_lines2)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2) # 1:1 linelines(pred ~ carc, data = pred_lines2, col ="red") # mean predictionlines(low ~ carc, data = pred_lines2, col ="red", lty=2) #lowerlines(hi ~ carc, data = pred_lines2, col ="red", lty=2) # upper# plot actual (simulated) observationspoints(x=df$carc[df$d==1], y=df$obs2[df$d==1] )
And finally, we can calculate and add a posterior predictive interval describing the range of observations most consistent with our prior and data.
# Same stuff from prior chunkplot(0:10, 0:10, type ="l", lty=2) # 1:1 linelines(pred ~ carc, data = pred_lines2, col ="red") # mean predictionlines(low ~ carc, data = pred_lines2, col ="red", lty=2) #lowerlines(hi ~ carc, data = pred_lines2, col ="red", lty=2) # upper# plot actual (simulated) observationspoints(x=df$carc[df$d==1], y=df$obs2[df$d==1] )# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_obs2 <-sim(fit=m2, data =data.frame(carc = new_carcs),n =1e4)# at each position then find the upper and lower 90% intervalinterval2 <-apply(X=possible_obs2, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to a data framepred_obs2 <-as.data.frame(t(interval2))#add in the x-values,pred_obs2$carc <- new_carcs# plot posterior predictive interval# rename 5% and 95% since R will choke on thosenames(pred_obs2)[1:2] <-c("low", "hi")lines(low ~ carc, data = pred_obs2, col ="red", lty=3) #lowerlines(hi ~ carc, data = pred_obs2, col ="red", lty=3) # upper
The vast majority of that code is the same. So, again, as long as you keep track of what’s what, you don’t need to reinvent the wheel each time you construct or run a model.
Also, note how quick and easy link() and sim() make generating expected values and possible observations, respectively? But don’t lose track of what they are doing. You an also go back to the prior version of this script to see how we did it by hand.
Fitting each model to the other’s data
So far we have fit each model to the data it produced, as a check on how well it works. But in the end, we are hoping to fit both models to the same real data and see which model is most consistent with those real observations. So it seems useful to fit each model to observations produced by the other model. In particular, let’s try fitting the Type I model to the data produced by the Type II model to see if we would even notice any systematic issues. (We’ll come back to metrics of model fit, later.)
Let’s simply re-fit the Type I model to the Type II data.
Again, we can use the same machinery as before. Indeed, we can simply update() the m1 model with new data! (This at least works with quap() and most models in R, like lm(). I’m not sure of its entire range of application, though.)
m1.2<-update(m1, data =list(obs = df$obs2[df$d ==1], carc = df$carc[df$d ==1]))samples1.2<-extract.samples(m1.2)
hist(samples1.2$sdlog, freq =FALSE, breaks =0:200/100)curve(dexp(x, rate=2), add =TRUE)abline(v=sdlogs[1], col ="red")
So you might be seeing that the estimate of the \(\sigma_{\log}\) parameter is a bit higher than the True parameter. Can you imagine why? I would guess that this is because the data are deviating more from the straight line expectation of the Type I model since they actually come from a saturating relationship.
Probably what we’re most interested in is seeing whether the wrong model can generate predictions that are reasonably consistent with the actual (simulated) data. So let’s first plot these relationships between initial and consumed carcasses most consistent with the prior and data. link() to the rescue!
# Generate a range of values along the x-axisnew_carcs <-seq(0.1, 10, length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of the parameter, a possible_lines1.2<-link(fit=m1.2, data =data.frame(carc = new_carcs), n =10000)# at each position then find the upper and lower 90% intervalinterval_lines1.2<-apply(X=possible_lines1.2, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to a data framepred_lines1.2<-as.data.frame(t(interval_lines1.2))#add in the x-values,pred_lines1.2$carc <- new_carcs# and the predicted line based on the mean value of a & h from the posteriorpred_lines1.2$pred <-apply(X=possible_lines1.2, MARGIN =2,FUN = mean)# plot posterior predictive interval# rename 5% and 95% since R will choke on thosenames(pred_lines1.2)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2) # 1:1 linelines(pred ~ carc, data = pred_lines1.2, col ="red") # mean predictionlines(low ~ carc, data = pred_lines1.2, col ="red", lty=2) #lowerlines(hi ~ carc, data = pred_lines1.2, col ="red", lty=2) # upper# plot actual (simulated) observationspoints(x=df$carc[df$d==1], y=df$obs2[df$d==1] )
And then let us see what sort of observations we would expect under this model, using sim().
# Generate a range of values along the x-axisnew_carcs <-seq(0.1, 10, length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_obs1.2<-sim(fit = m1.2, data =data.frame(carc = new_carcs), n =10000)# With this huge matrix, calculate interval at each x-valueinterval1.2<-apply(X=possible_obs1.2, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to data framepred_obs1.2<-as.data.frame(t(interval1.2))# add in x-values and predicted meanpred_obs1.2$carc <- new_carcspred_obs1.2$pred <-apply(X=possible_obs1.2, MARGIN =2,FUN = mean)# rename 5% and 95% since R will choke on thosenames(pred_obs1.2)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2) # 1:1 linelines(pred ~ carc, data = pred_obs1.2, col ="red") # mean predictionlines(low ~ carc, data = pred_obs1.2, col ="red", lty=3) #lowerlines(hi ~ carc, data = pred_obs1.2, col ="red", lty=3) # upper# plot actual (simulated) observationspoints(x=df$carc[df$d==1], y=df$obs2[df$d==1] )
So at this point it does seem our model can predict data that are consistent with our actual data, but it does that by assuming a large standard deviation around the expected (or mean) line. It also sort of looks like the model tends to under-predict carcass consumption at low amounts of initial carcass mass and over predicts consumption at higher initial masses. Let’s explore this deviation from the expectation to look for systematic bias, which would be indicative of a poorly-fitting model.
First, for each actual (simulated) observation, let’s generate a prediction, here based on the average value of \(a\) across the posterior. We could, given the simplicity of the model, simply calculate the values by hand (e.g., using mean(samples1.2$a)*df$carc[df$d==1]), but we can also use link() to generate predicted values from draws in the posterior and then average them. This latter approach is a bit simpler when the model gets more complex. Note that when we don’t give link new data, it just predicts expected values for the 18 observations in the data given to the model.
predictedI <-link(fit = m1.2)predictedI <-apply(X=predictedI, MARGIN =2, FUN = mean)
Then let’s plot the difference between the actual (simulated) observations and this prediction against the initial carcass mass.
plot(x=carc, y=df$obs2[df$d==1] - predictedI)abline(a=0, b=0, lty =2) # a line at zero to make this pattern clearer
So yes, my intuition about underpredicting for low initial masses and overpredicting for high initial masses seems to be supported. In other words, this Type I model is struggling to explain the observations simulated from the Type II model. This would be bad for our model if that was our goal, but it is good for our goal of correctly differentiating the “correct” functional relationship between carcass consumption and initial carcass mass.
This would be a more compelling example if we had done the same thing for the Type II model.
predictedII <-mean(samples2$a)*carc/(1+mean(samples2$a)*carc*mean(samples2$h))# OR predictedII <-link(fit = m2)predictedII <-apply(X=predictedII, MARGIN =2, FUN = mean)plot(x=carc, y=df$obs2[df$d==1] - predictedII)abline(a=0, b=0, lty =2)
Right! So whatever systematic bias we observed when fitting the wrong model is not apparent here. It’s not exactly a tight fit in that there is still a lot of scatter around the line! It’s only that the scatter above versus below the zero line seems equivalent wherever we are along the x-axis.
We could also see the reverse, how well the Type II model predicts data simulated from the Type I process, but I’ll set that aside for now.
I’ll also set aside metrics of “fit” for now (because I know you’re all wondering how we can measure the fit of both models to the data). So more later. For now, onward!
Real data!
All of that was warm-up, model and tool development and testing. It was all meant so that we could be confident we knew how things worked. Now we are ready to fit the model to our actual data8!
The quap() function requires that the names of the data are the same as the names of the variables in the model. Since we used carc and obs in the model, I’ve renamed the variables in the data frame to match (from “initial” and “consumed”, respectively). If you don’t match names, the model will run, but it will run as if there were no data. That is, the priors will not be updated!
We’ve done the hard work of building the code to fit the two models, so let’s simply recycle that code here, being sure we’re using the right variable names.
The type I model
I’m labeling everything with an I for the Type I model. Again, the code is essentially the same, following the steps of the grid-approximation routine:
mI <-update(m1, data = real)precis(mI)
mean sd 5.5% 94.5%
a 0.2920617 0.04535039 0.2195830 0.3645404
sdlog 0.6232833 0.10439004 0.4564478 0.7901187
We can then examine our posterior distributions for the two parameters of the model.
And finally we probably want to examine the predictions of the fit model against initial carcass mass as well as the posterior predictive distribution (i.e., what data would look like if it came from this model, given parameter and sampling uncertainty) and real (for real!) data.
# Generate a range of values along the x-axisnew_carcs <-seq(min(real$carc), max(real$carc), length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_linesI <-link(fit=mI, data =data.frame(carc = new_carcs),n=10000)# at each position then find the upper and lower 90% intervalinterval_linesI <-apply(X=possible_linesI, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to a data framepred_linesI <-as.data.frame(t(interval_linesI))#add in the x-values,pred_linesI$carc <- new_carcs# and the predicted line based on the mean value of a & h from the posteriorpred_linesI$pred <-apply(X=possible_linesI, MARGIN =2,FUN = mean)# plot posterior predictive interval# rename 5% and 95% since R will choke on thosenames(pred_linesI)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2, xlab="Initial carcass mass (g)", ylab="Carcass mass consumed in 48h (g)") # 1:1 linelines(pred ~ carc, data = pred_linesI, col ="red") # mean predictionlines(low ~ carc, data = pred_linesI, col ="red", lty=2) #lowerlines(hi ~ carc, data = pred_linesI, col ="red", lty=2) # upper# plot actual (real!) observationspoints(x=real$carc, y=real$obs)# NOW the posterior predictive interval# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_obsI <-sim(fit=mI, data =data.frame(carc = new_carcs),n=10000)# With this huge matrix, calculate interval at each x-valueintervalI <-apply(X=possible_obsI, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to data framepred_obsI <-as.data.frame(t(intervalI))# add in x-values pred_obsI$carc <- new_carcs# rename 5% and 95% since R will choke on thosenames(pred_obsI)[1:2] <-c("low", "hi")lines(low ~ carc, data = pred_obsI, col ="red", lty=3) #lowerlines(hi ~ carc, data = pred_obsI, col ="red", lty=3) # upper
Whoo are those posterior predictive intervals wide!
Let’s also repeat our examination of the residuals (observed-expected).
predicted <-mean(samplesI$a)*real$carcplot(x=real$carc, y=real$obs - predicted)abline(a=0, b=0, lty =2) # a line at zero to make this pattern clearer
So we see a very clear trend of underpredicting values of consumption at low initial carcass masses and overpredicting values of consumption at higher initial carcass masses. This is a model that is failing to produce the pattern in the data!
The type II model
We’ll follow the same workflow, just keeping the second model in mind and labeling things with II for the Type II model.
mII <-update(m2, data = real) # If this doesn't run at first, just try againprecis(mII)
mean sd 5.5% 94.5%
a 0.6418410 0.12856602 0.4363676 0.8473143
h 0.5876859 0.12079101 0.3946385 0.7807332
sdlog 0.3673495 0.06513815 0.2632462 0.4714529
Then we can examine the posteriors of the parameters.
That’s what we expected from our simulated data, so that’s nice.
Finally, let’s repeat the posterior predictive plot.
# Generate a range of values along the x-axis; same as abovenew_carcs <-seq(min(real$carc), max(real$carc), length.out=50)# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_linesII <-link(fit=mII, data =data.frame(carc = new_carcs),n=1e4)# at each position then find the upper and lower 90% intervalinterval_linesII <-apply(X=possible_linesII, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to a data framepred_linesII <-as.data.frame(t(interval_linesII))#add in the x-values,pred_linesII$carc <- new_carcs# and the predicted line based on the mean value of a & h from the posteriorpred_linesII$pred <-apply(X=possible_linesII, MARGIN =2,FUN = mean)# plot posterior predictive interval# rename 5% and 95% since R will choke on thosenames(pred_linesII)[1:2] <-c("low", "hi")plot(0:10, 0:10, type ="l", lty=2,xlab="Initial carcass mass (g)", ylab="Carcass mass consumed in 48h (g)") # 1:1 linelines(pred ~ carc, data = pred_linesII, col ="blue") # mean predictionlines(low ~ carc, data = pred_linesII, col ="blue", lty=2) #lowerlines(hi ~ carc, data = pred_linesII, col ="blue", lty=2) # upper# plot actual (real!) observationspoints(x=real$carc, real$obs)# NOW the posterior predictive interval# for each value along the x-axis, generate 10,000 possible observations# each based on a draw of a and sdlogpossible_obsII <-sim(fit=mII, data =data.frame(carc = new_carcs),n=1e4)# With this huge matrix, calculate interval at each x-valueintervalII <-apply(X=possible_obsII, MARGIN =2,FUN = rethinking::PI, prob =0.9)# transpose and convert to data framepred_obsII <-as.data.frame(t(intervalII))# add in x-values pred_obsII$carc <- new_carcs# rename 5% and 95% since R will choke on thosenames(pred_obsII)[1:2] <-c("low", "hi")lines(low ~ carc, data = pred_obsII, col ="blue", lty=3) #lowerlines(hi ~ carc, data = pred_obsII, col ="blue", lty=3) # upper
And for fun, let’s add in the lines from the other model
plot(0:10, 0:10, type ="l", lty=2) # 1:1 linepoints(x=real$carc, y=real$obs,xlab="Initial carcass mass (g)", ylab="Carcass mass consumed in 48h (g)")# Type II modellines(pred ~ carc, data = pred_linesII, col ="blue") # mean predictionlines(low ~ carc, data = pred_linesII, col ="blue", lty=2) #lowerlines(hi ~ carc, data = pred_linesII, col ="blue", lty=2) # upperlines(low ~ carc, data = pred_obsII, col ="blue", lty=3) #lowerlines(hi ~ carc, data = pred_obsII, col ="blue", lty=3) # upper# Type I modellines(pred ~ carc, data = pred_linesI, col ="red") # mean predictionlines(low ~ carc, data = pred_linesI, col ="red", lty=2) #lowerlines(hi ~ carc, data = pred_linesI, col ="red", lty=2) # upperlines(low ~ carc, data = pred_obsI, col ="red", lty=3) #lowerlines(hi ~ carc, data = pred_obsI, col ="red", lty=3) # upper
That looks like a pretty good fit (although it is pretty busy!). The blue consistency envelope has a much narrower range of possible lines and the posterior predictive interval has a much narrower range of possible observations, if nothing else. Moreover, it seems like the Type II model (blue) it is capturing the general fit of the data much better than the Type I.
Let’s examine our residuals (observations - expected) to look for biases.
predicted <-link(mII)predicted <-apply(X=predicted, MARGIN =2, FUN = mean)plot(x=real$carc, y=real$obs - predicted)abline(a=0, b=0, lty =2) # a line at zero to make this pattern clearer
That looks much, much less bad compared to the Type I model.
So, to recap, it looks like our data are better described by a Type II model, although the predictions of the two models overlap quite a lot over the full range of our data. That said, there appears to be a large handling time:
mean(samplesII$h)
[1] 0.5884788
rethinking::PI(samplesII$h, prob=0.9)
5% 95%
0.3929752 0.7867688
Since this is handling time in units of the duration of the experiment, which was 48h, we can multiply these numbers by 48h to estimate the handling time in hours (per gram).
mean(samplesII$h)*48
[1] 28.24698
rethinking::PI(samplesII$h, prob=0.9)*48
5% 95%
18.86281 37.76490
So that, from a biological point of view, seems substantial. We can use these estimates to make predictions about how many carcasses (or carcass mass) might be removed by scavengers under different data, having some confidence that a model that includes handling time is much more reasonable. However, we’ll have to return to the concept of model comparison later. For now, perhaps we can compare parameter estimates and think about, for instance, what the much larger estimate of \(\sigma_{\log}\) for Model I means.
To better understand how animals might forage, what strategies they would take, etc., he had undergraduate volunteers pick up discs of sand paper (I think… my recollection is imperfect) with tacks or their fingers while blind folded, etc., counting how many they could get in a certain amount of time. He noticed the empirical patterns, which others than built some equations to describe and even explain; I don’t think he was that mathematically inclined. Anyway, they’ve been hugely influential and useful, including here.↩︎
There are actually a lot of equations that produce more or less similar curves, starting with different assumptions. The Michaelis-Menton equation is one common approach. It just goes to show you that a hypothesis can be represented by multiple models and a model can be consistent with multiple equations.↩︎
We can do these things, but they’re beyond our current scope.↩︎
If we wanted to enforce our consumed carcass values to be \(\leq\) the initial carcass values we could model the proportion of the carcass consumed and use a beta distribution to describe the variation.↩︎
NB: using the same plotting functions as before caused…issues, so we’ll just do this hacky version for now. Since we’re not going to keep using grid-approximation for too much longer it’s not worth spending too much time how to circumvent these issues.↩︎
Actually, we had data from two ponds, but we have not yet seen how to deal with this issue, so we’ll begin with just the single pond.↩︎